Skip to content

feat: add column mapping support for add_column#2389

Open
william-ch-databricks wants to merge 2 commits intodelta-io:mainfrom
william-ch-databricks:stack/alter-table-5-column-mapping-add
Open

feat: add column mapping support for add_column#2389
william-ch-databricks wants to merge 2 commits intodelta-io:mainfrom
william-ch-databricks:stack/alter-table-5-column-mapping-add

Conversation

@william-ch-databricks
Copy link
Copy Markdown
Collaborator

@william-ch-databricks william-ch-databricks commented Apr 14, 2026

🥞 Stacked PR

Use this link to review incremental changes.


What changes are proposed in this pull request?

Extends ALTER TABLE ... ADD COLUMN to work on tables with column mapping enabled. New columns get a fresh column ID and a UUID-based physical name assigned, and delta.columnMapping.maxColumnId is updated in table properties.

How was this change tested?

  • Unit tests covering name mode, id mode, and the missing-maxColumnId error case.
  • Integration tests that add a column to a column-mapping table and verify the metadata is assigned correctly, plus round-trip data read after add.

@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-5-column-mapping-add branch 17 times, most recently from 36cd0da to d943ca2 Compare April 17, 2026 21:15
github-merge-queue Bot pushed a commit that referenced this pull request Apr 17, 2026
…2385)

## Stacked PR
Use this
[link](https://github.com/delta-io/delta-kernel-rs/pull/2385/files) to
review incremental changes.
-
[**stack/alter-table-1-refactor-state**](#2385)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2385/files)]
-
[stack/alter-table-2-supports-data-files](#2386)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2386/files/029d6672081d32caea314dca61f3688e841f3036..50130c30c4378997fbf3d0ef7426ce23992e5c85)]
-
[stack/alter-table-3-framework-add-column](#2387)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2387/files/50130c30c4378997fbf3d0ef7426ce23992e5c85..eaa5277d025434aa78b79beed1a7cedfe82aa621)]
-
[stack/alter-table-4-set-nullable](#2388)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2388/files/eaa5277d025434aa78b79beed1a7cedfe82aa621..7506e1273ef349fbf628e4a6db5d0e7c1b4cbd8b)]
-
[stack/alter-table-5-column-mapping-add](#2389)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2389/files/7506e1273ef349fbf628e4a6db5d0e7c1b4cbd8b..d943ca2945da1ec9c354db57c889d93456381dee)]
-
[stack/alter-table-6-drop-column](#2390)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2390/files/d943ca2945da1ec9c354db57c889d93456381dee..40e3a1ff9e2915b456d0f0a76a50eb67f27a1ab1)]
-
[stack/alter-table-7-rename-column](#2391)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2391/files/40e3a1ff9e2915b456d0f0a76a50eb67f27a1ab1..628af3c02767f008a4e3c80a146aec7e5a3ac0b3)]

---------
## What changes are proposed in this pull request?

Splits Transaction's snapshot into two concerns:
- `read_snapshot_opt: Option<SnapshotRef>` -- the pre-commit table state
(None for CREATE TABLE)
- `effective_table_config: TableConfiguration` -- the config this commit
will produce

This separates "what did I read?" (conflict detection, post-commit
snapshots) from "what will
this commit produce?" (schema, protocol, stats, write context).
Write-path call sites read from
`effective_table_config`; read-path call sites use `read_snapshot()`.

Also adds `should_emit_protocol` / `should_emit_metadata` flags to
replace the old
`is_create_table()` checks for Protocol/Metadata action emission, and
replaces the synthetic
pre-commit snapshot in CREATE TABLE with direct `TableConfiguration`
construction.

This is a pure refactor with no behaviour change.

## How was this change tested?

All existing tests pass. Added unit tests for
`LogSegment::new_for_version_zero` (valid input,
non-zero version rejection, non-commit file rejection).
@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-5-column-mapping-add branch 4 times, most recently from 19fbe4b to ca02602 Compare April 22, 2026 05:06
@william-ch-databricks william-ch-databricks marked this pull request as ready for review April 22, 2026 05:07
Comment thread kernel/src/transaction/builder/alter_table.rs Outdated
Comment thread kernel/src/transaction/builder/alter_table.rs Outdated
Comment thread kernel/src/transaction/builder/alter_table.rs Outdated
Comment thread kernel/src/transaction/builder/alter_table.rs Outdated
Comment thread kernel/src/transaction/builder/alter_table.rs Outdated
Comment thread kernel/src/transaction/schema_evolution.rs Outdated
Comment thread kernel/src/transaction/builder/alter_table.rs
Comment thread kernel/src/actions/mod.rs Outdated
Comment thread kernel/tests/alter_table.rs
Comment thread kernel/tests/alter_table.rs
@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-5-column-mapping-add branch 3 times, most recently from 74af2a0 to 2133e27 Compare April 23, 2026 03:39
@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-5-column-mapping-add branch 2 times, most recently from cb5ca4e to 28be97c Compare April 23, 2026 20:22
ethan-tyler pushed a commit to ethan-tyler/delta-kernel-rs that referenced this pull request Apr 23, 2026
…a-io#2386)

## Stacked PR
Use this
[link](https://github.com/delta-io/delta-kernel-rs/pull/2386/files) to
review incremental changes.
-
[stack/alter-table-1-refactor-state](delta-io#2385)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2385/files)]
[MERGED]
-
[**stack/alter-table-2-supports-data-files**](delta-io#2386)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2386/files)]
-
[stack/alter-table-3-framework-add-column](delta-io#2387)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2387/files/a71c8b8c83d33b1ab882cae984973a76aafc3a31..6dd1b32e8c5163018f2c836f733b58fc4bdc2dc2)]
-
[stack/alter-table-4-set-nullable](delta-io#2388)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2388/files/6dd1b32e8c5163018f2c836f733b58fc4bdc2dc2..d121f77a73a37f12421c42dc5c4a5626ad2a1e10)]
-
[stack/alter-table-5-column-mapping-add](delta-io#2389)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2389/files/d121f77a73a37f12421c42dc5c4a5626ad2a1e10..cb5ca4e01024e50ca6b5b0576bbeb9bb6b184d17)]
-
[stack/alter-table-6-drop-column](delta-io#2390)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2390/files/cb5ca4e01024e50ca6b5b0576bbeb9bb6b184d17..6e91f960109a4d4db662e417780bb4beaac88f5f)]
-
[stack/alter-table-7-rename-column](delta-io#2391)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2391/files/6e91f960109a4d4db662e417780bb4beaac88f5f..300d676946d785eb195018e2a9ee696c1f6623a6)]
-
[stack/alter-table-8-operation-string](delta-io#2447)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2447/files/300d676946d785eb195018e2a9ee696c1f6623a6..fee7c029d62cfb55b809ced35c0cb0553ddaf3e7)]
-
[stack/alter-table-9-is-blind-append](delta-io#2448)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/2448/files/fee7c029d62cfb55b809ced35c0cb0553ddaf3e7..3d44b51e0d037c8f33ffbece2d7deecd800d5544)]

---------
## What changes are proposed in this pull request?

Introduce a `SupportsDataFiles` marker trait to gate data-file methods
at compile time. This prevents future metadata-only transaction types
(like `AlterTable`) from accessing methods that add, remove, or
configure data files.

**Gated behind `SupportsDataFiles` (only `ExistingTable` and
`CreateTable`):**

| Method | Reason |
|--------|--------|
| `stats_schema()` | Returns table-specific stats schema for writing
files |
| `stats_columns()` | Returns column names needing stats collection |
| `generate_logical_to_physical()` | Builds partition column transform
for file writing |
| `get_write_context()` | Returns full write context (target dir,
schemas, column mapping) |
| `add_files()` | Queues file additions |

**Kept on `impl<S> Transaction<S>` (all transaction types):**

| Method | Reason |
|--------|--------|
| `commit()` | Shared commit path; handles empty data gracefully |
| `with_data_change()` / `set_data_change()` | ALTER TABLE sets
`data_change: false` |
| `with_engine_info()` | Applicable to all commits |
| `with_commit_info()` | Applicable to all commits |
| `with_transaction_id()` | Applicable to all commits |
| `with_domain_metadata()` | Schema changes may need domain metadata |
| `add_files_schema()` | Read-only accessor; used by `generate_adds()`
which handles empty state internally. Kept ungated so it remains
available if it becomes dynamic in the future. |
| `validate_add_files_stats()` | Internal; handles empty data with early
return |
| `generate_adds()` | Internal; returns early when no files queued |
| `generate_remove_actions()` | Internal; returns early when no files
queued |
| `generate_dv_update_actions()` | Internal; returns early when no DV
updates |
| `generate_adds_for_dv_update()` | Internal; only called from
`generate_dv_update_actions` |
| `build_crc_delta()` | Internal; shared commit path |
| `into_committed()` / `into_conflicted()` / `into_retryable()` |
Internal; shared commit path |

## How was this change tested?

All existing tests pass unchanged. The trait adds compile-time safety
with no runtime behavior change. A `compile_fail` doctest will be added
in the follow-up PR that introduces `AlterTable` to verify that data
file methods are rejected at compile time.
@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-5-column-mapping-add branch from 28be97c to 80b72bb Compare April 23, 2026 22:03
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 23, 2026

Codecov Report

❌ Patch coverage is 96.69604% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.47%. Comparing base (6486bd2) to head (6a7ccf9).

Files with missing lines Patch % Lines
kernel/src/transaction/schema_evolution.rs 96.38% 6 Missing and 4 partials ⚠️
kernel/src/actions/mod.rs 96.36% 0 Missing and 2 partials ⚠️
kernel/src/schema/mod.rs 90.47% 1 Missing and 1 partial ⚠️
kernel/src/transaction/builder/alter_table.rs 94.44% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2389      +/-   ##
==========================================
+ Coverage   88.40%   88.47%   +0.06%     
==========================================
  Files         178      178              
  Lines       58235    58669     +434     
  Branches    58235    58669     +434     
==========================================
+ Hits        51485    51905     +420     
- Misses       4783     4790       +7     
- Partials     1967     1974       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-5-column-mapping-add branch 2 times, most recently from f396c64 to ae41dcb Compare April 24, 2026 17:43
@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-5-column-mapping-add branch 2 times, most recently from 41651eb to a2b5b8b Compare April 26, 2026 22:13
Copy link
Copy Markdown
Collaborator

@scottsand-db scottsand-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome! Left some comments.

Comment thread kernel/src/actions/mod.rs
///
/// Returns [`Error::InvalidProtocol`] if the property is present but not a non-negative
/// integer.
pub(crate) fn max_column_id(&self) -> DeltaResult<Option<i64>> {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have this already as a TableProperty? It should parse it for us already, though I'm not sure it does the non-negative check.

if has_id || has_physical_name {
return Err(Error::generic(format!(
"Field '{}' already has column mapping metadata. \
Pre-existing column mapping metadata is not supported for CREATE TABLE.",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this just be a loop where we iterate over the two fields, look each one up, and then return a very specific error message for that exact field instead of referring to both fields in the error message?

}

/// Returns the largest `delta.columnMapping.id` found anywhere in `schema`, including nested
/// data types. Returns `0` if no field carries a column mapping ID.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this function set the default, or should we return an Option and let the caller set the default to zero? I defer to you.

pub(crate) struct SchemaEvolutionResult {
/// The evolved schema after all operations are applied.
pub schema: SchemaRef,
/// `Some(id)` if column-mapped columns were added and `delta.columnMapping.maxColumnId`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If column mapping is enabled, what's an example of a non-column-mapped column?

This sentence structure is a bit confusing. It's probably best to just say: if Some then X must be updated to this new value.

///
/// The field must not already exist in the schema (case-insensitive). The field must be
/// nullable because existing data files do not contain this column and will read NULL for it.
/// `field` must not carry `delta.columnMapping.id` or `delta.columnMapping.physicalName`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

field and any of its nested fields must not carry

let cm_enabled = column_mapping_mode != ColumnMappingMode::None;

// Take max of persisted `maxColumnId` and schema-walked max to guard against a
// non-conforming writer leaving the property stale. Matches delta-spark's `findMaxColumnId`.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete Matches delta-spark's findMaxColumnId.

@@ -146,14 +161,6 @@ pub(crate) fn apply_schema_operations(
// `DELTA_FEATURES_REQUIRE_MANUAL_ENABLEMENT` and requires the user to enable the
// feature explicitly before adding such a column.
SchemaOperation::AddColumn { field } => {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From claude:

  On a non-CM table, an AddColumn whose top-level field is clean but an inner struct field carries delta.columnMapping.id/physicalName will pass this check. The error is still caught later by make_physical/validate_schema_column_mapping, so this is not a soundness bug, but the error surfaces far from the source.

  Either make reject_preexisting_cm_metadata recursive (preferred — symmetric with assign_field_column_mapping which is recursive), or document explicitly that callers rely on validate_schema_column_mapping for nested cases.

// For CREATE TABLE, reject any pre-existing column mapping metadata.
// This avoids conflicts between user-provided IDs/physical names and the ones we assign.
// ALTER TABLE (adding columns) will need different handling in the future.
// TODO: Also check for nested column IDs (`delta.columnMapping.nested.ids`) once
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add this TODO back? cc @dengsh12 FYI

#[rstest]
#[case::one_column(1)]
#[case::three_columns(3)]
#[case::no_cm_one_column(None, 1)]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do the cross-product more easily by just using rstest values instead of rstest case.

}
}
assert!(max_column_id(&reloaded) > orig);
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Else, if column mapping is not enabled, please add an assertion that metadata().configuration().get("delta.columnMapping.maxColumnId").is_none()

@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-5-column-mapping-add branch from a2b5b8b to 957a912 Compare April 27, 2026 06:19
Implement the SetNullable schema operation which changes a column's
nullability from NOT NULL to nullable. If the column is already nullable,
this is a no-op.

Only the safe direction is allowed (NOT NULL -> nullable) per the Delta
protocol.
When column mapping is enabled (mode = name or id), newly added columns
via ALTER TABLE ADD COLUMN are now assigned column mapping metadata
(unique ID and UUID-based physical name). The maxColumnId table property
is updated in the evolved metadata configuration.

Reuses the existing assign_field_column_mapping function (now pub(crate))
which handles recursive assignment for nested types.
@william-ch-databricks william-ch-databricks force-pushed the stack/alter-table-5-column-mapping-add branch from 957a912 to 6a7ccf9 Compare April 27, 2026 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants